source-zendesk-support: fix Groups and Organizations streams #2007

Alex-Bair · 2024-10-01T22:24:12Z

Description:

The Groups and Organizations streams were missing data. This PR addresses the issues causing these streams to miss data.

`Groups`

Deleted groups were not being retrieved previously. By including the exclude_deleted query param and setting it to false, the Zendesk Support API is now returning deleted groups.

`Organizations`

Only 10k organizations were being retrieved previously. The Organizations stream was using page numbers to paginate through results. This is an older, but still supported, pagination method for Zendesk Support, and they only allow you to paginate through 10k results even if there are more. Using the page number based pagination method was causing us to only retrieve 10k documents and miss any remaining ones.

I've updated the Organizations stream to use Zendesk's time-based incremental export endpoint. This endpoint paginates by providing a start time, and all resources created/updated at or after that start time are returned. This lets us retrieve all resources instead of just a subset. This pagination strategy does have the potential downside of getting stuck in a loop if 1000+ (the max number of results per page) resources are updated at the exact same time, but there aren't currently better ways to retrieve incremental updates for Organizations.

Capture snapshot changes are expected. The new time-based incremental export endpoint adds a deleted_at property to each organization resource.

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

Tested on a local stack with a Zendesk account that had deleted groups and more than 10k organizations. Confirmed:

deleted groups are now retrieved
more than 10k organizations are retrieved
updated tests pass

All Organizations bindings for existing tasks need to be backfilled after merging to clear out previous state.

This change is

williamhbaker · 2024-10-02T14:12:56Z

source-zendesk-support/source_zendesk_support/streams.py

+    """
+    Converts a date-time formatted string to a millisecond UNIX timestamp.
+    """
+    return int(datetime.fromisoformat(dt_str).timestamp())


I think this is actually going to return seconds - the float result has the integer part as seconds, and the fractional part as microseconds. We have similar helpers elsewhere that account for this to get millisecond epoch times.

You're right, thanks for catching that. I was wrong in that function description. Zendesk's timestamps are in seconds, not milliseconds. I'll update the function description to say "...to a UNIX timestamp in seconds".

Also, the reason I stored the state as a date-time formatted string instead of an actual datetime objects was because when I retrieved the state after restarting the task, the retrieved state wasn't a datetime object but was the ISO formatted string. I'll update the helper function names to be clearer that they're accepting/returning a string and not a datetime object.

williamhbaker · 2024-10-02T14:14:50Z

source-zendesk-support/source_zendesk_support/streams.py

+    """
+    API docs: https://developer.zendesk.com/api-reference/ticketing/ticket-management/incremental_exports/#incremental-organization-export
+
+    Note: This stream theoretically can get stuck in a loop if 1000+ organizations are


Not a blocker, but per the API docs it kind of implies that this won't actually be possible, since it sounds like it'll keep returning results until one of them has a different timestamp?

... The 1000-item limit may be exceeded if items share the same timestamp ...

Yep, the API docs do make it seem like it wouldn't be an issue. But we have seen other streams that used this time-based incremental export method get stuck in a loop if 1000+ resources are updated at the same time (ex: we had to move the Tickets stream to a different pagination method because it was stuck requesting the same timestamp - PR was #1789).

I don't remember the exact number of tickets that were updated at the same time in that instance, but I think it was multiple thousands. I suspect there's a limit to how much higher that 1000-item limit can go. So if there are more resources updated in the same second than 1000 + whatever the flex amount is, the stream will get stuck.

…l time-based export The Organizations stream was previously using page numbers to retrieve results from Zendesk. This is an older pagination method for Zendesk Support, and we can only retrieve 10,000 documents using this endpoint. After 10k documents, Zendesk cuts us off. This meant that we weren't retrieving all organizations for users with more than 10k organizations. We can switch the Organizations stream to use "time-based" incremental exports. Instead of a page number, we use a specific start time to request all organizations that have been updated at or after that start time. Using this pagination strategy does have the potential downside of getting stuck in a loop if 1000+ organizations are updated at the exact same time, but there are not currently better ways to retrieve incremental updates for organizations. The new `SourceZendeskSupportIncrementalTimeExportStream` is an improvement on the existing `SourceZendeskSupportIncrementalExportStream` in almost every way. At a later date, we could consider refactoring out `SourceZendeskSupportIncrementalExportStream` since the new class should serve the same purpose even better. Doing so would require a lot of time intensive refactoring & inheritance un-tangling that are so I've left that for future work.

williamhbaker

LGTM

Alex-Bair added the change:unplanned Unplanned change, useful for things like doc updates label Oct 1, 2024

Alex-Bair requested a review from williamhbaker October 1, 2024 22:24

williamhbaker reviewed Oct 2, 2024

View reviewed changes

source-zendesk-support: include deleted groups in Groups stream

01c87f1

Alex-Bair force-pushed the bair/zendesk-support-fix-streams branch from 58ce1d6 to 221b14a Compare October 2, 2024 15:30

williamhbaker approved these changes Oct 2, 2024

View reviewed changes

Alex-Bair force-pushed the bair/zendesk-support-fix-streams branch from 221b14a to 387feb6 Compare October 2, 2024 15:35

Alex-Bair merged commit b84f7f0 into main Oct 2, 2024
66 of 75 checks passed

Alex-Bair deleted the bair/zendesk-support-fix-streams branch October 2, 2024 21:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

source-zendesk-support: fix Groups and Organizations streams #2007

source-zendesk-support: fix Groups and Organizations streams #2007

Alex-Bair commented Oct 1, 2024 •

edited

Loading

williamhbaker Oct 2, 2024

Alex-Bair Oct 2, 2024

williamhbaker Oct 2, 2024

Alex-Bair Oct 2, 2024

williamhbaker left a comment

source-zendesk-support: fix Groups and Organizations streams #2007

source-zendesk-support: fix Groups and Organizations streams #2007

Conversation

Alex-Bair commented Oct 1, 2024 • edited Loading

Groups

Organizations

williamhbaker Oct 2, 2024

Choose a reason for hiding this comment

Alex-Bair Oct 2, 2024

Choose a reason for hiding this comment

williamhbaker Oct 2, 2024

Choose a reason for hiding this comment

Alex-Bair Oct 2, 2024

Choose a reason for hiding this comment

williamhbaker left a comment

Choose a reason for hiding this comment

Alex-Bair commented Oct 1, 2024 •

edited

Loading

`Groups`

`Organizations`